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ElemenC-by^element solution strategies are developed for transient heat- 
conduction problems. Results of numer i cal ■tests Indicate the effectiveness of 
the procedures proposed. The small data base requirements and attractive 
architectural featucea-of the algorithms suggest considerable potential for 
solving large scale problems. 
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Appendix 1 — Derivation of Linear Algebraic Systems In the Finite 
Element Analysis of Heat Conduction Problems 


^ Xntroduetion 

»• «m mnpU of .0 .Ument-by-,le,eilt (EBE> olgotlth. for hoo« 
conduction w. by Hoghon, Uvlt and Mlng.e fH4l. In that work th. 

m eoncayt waa »a«i to d«,alop a non-Itaratt.. a«=ond-ord.r tlno-aocurat. 
-nc««Uti««Uy atabla tranalano^lgorltta for both Xhtaar^nd nonllnaar prob- 
lc~. Ma«.t arraya could b. procaaaad Indluldually with no naad to cn.truc, 
. dlob .1 coafflclant «trlx. <k.r-lnltlal nunarlcaUtaatlng with thla achana 
proved aatiafactoty. However, Uter on wa dlacovered that under certain clr- 
cudatancaa the accuracy laval-attalnoi by typical globally Implicit mathoda. 

th. Crank-Nicolaon procadura. .a. no. at.aln«,^y th. method of tH4). 
Th. probl« tt««i to apatlal truncation ar«r tarm. auch aa thoaa which 
nffllct «»M-cU..lc.l apllt-oparator finite dlfferwce method, auch a. th. 
DuPoft-rt«,k.l mathod (All. To overcome theae .ccue,cy deflclenclee we were ' 
led-to reformuUt. the E.E procedure ae an Iterative linear eguatlon eolvar .0 
that atandard time dl.cretlz.tlon technique, could be maployed. In thla way 
1..U.. Of atablllty and accuracy are obviated. The only queatlon which remains 
t. how feat doe. the Iterative procea. cenvergel At the mime time the small 
data baae «id attractive architectural features of the E.E process are retained, 
dnothet advnntag. which accrue la that coupled capacity matrlcea may be accomo- 
dated. Thla Improve, upon [H4l which waa restricted to lumped capacity. 

On the other hand, relegating the EBE concept to Iterative linear equation 
.diving doe. not aeem to exploit Ita full potential. To the authors, thla 
repre..nt. a conaervatlve. Interim strategy. In future research we hope to 
explore the ua. of EBE concept, throughout the entire problem solving spectrum. 
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There already aaema clear paths for significant Increases of efficiency in- 
large, nonlinear pjcoblems by adopting this philoso phy, it is interest lag to 
not^-that the chronology of research developments in multigrid techniques 
followed— along similar lines in-that Initial success was found in itera ti ve 
linear- equation solving, but subsequent improvement was obtained by procedures 
In which multigrid concepts permeated all aspects of the solution process 
(Brandt [B2]). 

By restricting the use of the EBE concept to iterative equation solving 
the research-problem is rendered tractable in that other aspects of solution 
caii be done by standard means. Despite this fact there still appears to be a 
great deal of variety to the types of EBE strategies which may be developed. 

basically, three main ingredients are necessary for an EBE iterative linear 

equation solver. They are an iterative driver strategy, an EBE approximate 
factorization scheme, and the definition of an array which approximates the 
global coefficient matrix and- is amenable to EBE ap proxima te factorisation. 
These topics are explored in Sections 2 to 4, respectively. In Section 5, 
sample problems are presented. 

For related developments in the area of structural analysis, the inter- 
ested reader is urged to consult [H5, HIO, Hll, Nl, 01]. A pilot study of a 
transonic flow, involving an unsymmetric coefficient matrix, is presented in 
[Hll]. By virtue of the fact that the present thrust to research in EBE tech- 
niques has been concerned with aspects of linear equation solving, a certain 
synthesis of concepts has ensued. This, as remarked above, is believed to be 
a very temporary state of affairs. To further develop EBE algorithms which are 
truly effective for different problem classes, the physics and idiosyncracles 
of the Individual classes will need to be accounted for in the structure of the 


. 
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algerlChms. For examjOft^w may contrast the transient heat conduction and 
•tructural schamas preaentad-in \M] and (Oll^pectlvely. The heat con- 
duct loo-achame was formulated in-terms of temperature degrees-of-feeedom in a 
very naturaUmay. No structural analog in terms of kinematicai-variablea 
could be developed which attained- unconditional stability. Rather, an entirely 
new global formulation had to be created with stresses and velocities as prl- 
mary unknowns: Needless to say, the developmental Implications of such schemes 

are significant. A unique procedure of this kind requires considerable research 
on all levels to be brought to fruition. We anticipate this being the case for 
the various problem classes to which the EBE concept will undoubtedly be applied 


\ 
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2* Iterativ e AleorittmiH 

Iw cndldMe lUrdtiv.. dlBorUhm, which can be u«cd l« conjunction wUh 
approximately factorized arrays arc described below. 

2a* Parabol ic fteaularlzatlon 

The parabolic regularization algorithm la derived in fH5). Table 1 pre- 
sents a flowchart_of U,e procedure for symmetric posit Ive-dof Inito. systems. 

of the pHrf.buJje r^iartzatlon wIM. l t«.b 

eearch and gPCS 

Stdp 1 . InitUllzatlonr 


® • 0 * *1, • 0 , r 

^0 ^ * 'o 

" 6k " 2 (loop: k 


Ax - 8 *-r- 


^ ^ "UFCS^ 


Step 2. Line search: 

e - Ax^ r^/Ax^ A A* 


Step 3. Convergence check: 

< d 

Yea: Return 

• No : Continue 

Step 4. Relabel old BPGS vectors: 


^k-1 “ !k • 


6k-l “ 6k • 


(loop: k - 2 . 3 „ \ 

* "bpcs^ 
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8t«p 5. C«lcuUe« new BPCS vectorH; 

8n,«,S • j-fl - « - 
St«P 6, — New s e a rch direction: 


ORIQINAt PACe !i 
OF POOR OUALIKY 


! • Ew-1 


5 ♦ !)8k (loop: k - .. I „ 


s * A*^ z 


♦ s ♦ (si ;)£k 


(loop: k . I . ■ 


Ax *8 z 


Step 7, Q n 4 1 , go to Step 2. 

The notation In Table 1 la given as follows: m la the iteration counter; 

the f^*a and ‘a are the BFGS vectors; n^p^,g is the maximum number of 
BFGS vectors allowed; B la a liiatrlx which approximates A , but la more easi- 
ly factorized; a ia the search parameter; x^ is the m*"^ approximation of 
X , 5. • 5 “ ^ la the corresponding residual; I|r^|| Is fts Euclidean 

length; and 6 is a preassigned error tuleram e. The search parameter in 
step 2 is determined by minimizing the potential energy 


P(b) - 


- (h^ + s Ax)‘ (b - J- A(X + S Ax)) 


(2.1)- 


Pfcondltionad Ccn}m.nt i^ratl Jeni it 

This algorithm is a general lz„Uo„ of the classical conjugate gradients 
-thod (... Hastanas-Stlaful |H 1 |) m which a ••preconditioning- is performed 


(t 


OF. POOR QUAUlV 

Mm* » . th* Mtrlx apprctaMliW A . The elgotlthi. U eumMtlee* u, 


Table 2. 


* a.«>fpA«rt of pre.ondltlo.»e cenl.eete /.v, 

Step I, Inltlallzatlca: 


n ■ 0 


Co * 5 


5 o" 9 


So ■ !o ■ • C( 


Seep 2 . a 2 /«'*’ a d 

m -.m ^ Bm 

Step 3. X . »”X + u p 

-B*l *.m m J-iii 

Step 4. r - r - u a i. 

Step 5e Convergence check: 

< 6 ? 

Yes: Return 

No : Continue 

Step 6. z • r 

'B+l * -m+l 

®« • C«+1_ 

stMO. B «1 • ♦ 0, e„ 

Step 9. m <«- m 1 , go to Step :* 


Glowlnskt et ol. |cl, (J2 | (uvv also relureiices tliervlii) have success- 
»«Uy the pr«.MU,«,e* ce«J..«»u. „.Hr n„,t. e... 

■Mt wrk. The Betctji >hlch they ...ploy pm-orul it loner l» JeU mine* by My 
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of varioue "Incomplete Choleaky. factorlKatlons" (see c.g. ThoinaHoet fTlj and 

roferencea therein). 

Remark 7 . A fixed number of vectora la all that is needed In the CG method. 
Thio-makea it-computat4.onally more attractive than the PR algorithm with BFCS 

upilatee, because a considerable number of BFGS vectors typically need to be 
stored-. 


3* Appro«<«a te Factorisation 

the convergence rate of the algorltluns presented In the preceding section 
depend heavily upon the approximating matrix B . It may be noted that if 
5 ^ then both algorithms Immediately obtain the exact solution x . Nuner- 

Oiii choicea for B are p^slble. To explore some of the posslbllltlua we shall 
introduce the following notatlonal scheme. Let 


A - L (A)D (A)U^(A) 

(product decomposition) 

(3.1) 

A - L_(A) + D (A) + U (A) 

(sum decomposition) 

(3.2) 


where the subscripts p and s Indicate "product" and "sun", reapoctlvely . 

Equation (3.1) represents the Croat factorization . Thus L and U are 
lower and upper triangular matrices, respectively, with diagonal entries equal 
to 1. and D (A) is a diagonal matrix. If A Is symmetric, then L (A) - 
entries of are nonnegative, then we can write 

-3) 

where 


L 

p 


L D** 
-P-P 


(3.4) 


preceding page blank 


NOTnu®) 



9 


ii • i/*ii 


0RIGIW/>|, I V. ; 
OF POOR QOAIIIV 


(3.5) 


WlMn A U«yM«Cri« po^ttlv^-a^^nniu*. (3.3)-(1.5) deftiuiH the Uh oli^ukv. or 
■QuAf-root. factor Igat ion . 

In squaeion (3.2), and arts lower and upper m.ingular matrices 
with diagonal entries equal to 0 , and la diagonal. In analogy with the 
product deconposition, we may write 


’ t«<A) + y (A) 


wher« 


(3.6) 


K = I + T n 
'•a -8 2 -a 


U - U + i D 
••s ^8 2 -s 


( 3 .n 

(3.8) 


If A is syoaetric, then L (a) - u^iA)*^ and L (a) - U (A)*** . 


deconposition (3.6)-(3.8) has figured in the transient analysis al- 
gorlthns developed by Trujillo lT2, T3| and subsequently discussed by Park [PlJ. 

Note that the net total storage required for the sum decomposition 
la exactly the sane as for the original matrix. However, the product decom- 
position entails Increased storage due to "flll-ln" of zeros within the skyline. 

This la perhaps the major drawback of direct solution schemes such as Crout 
elimination. 

ignore the line search and quasi-Newton update ingredients of 
the PR algorlthn, then classical Iterative algorithms are obtained by choosing 


$ •• foirowa; 


B " D (A) 


OF POON 


(Jacobi mcihod) 


5 ■ ^ ?gW (iiauas-Scldci method) ^3 

To dooerlbo Cha procedures that are emphanizod herein, we first consider 
A f written In the following form: 

^ + •- A)w‘'" (3 j 

where I is the identity matrix, W is a positive-definite diagonal matrix, 
e is a scalar, and A Is a matrix which .. ^s the some sparsity pattern as A 
A is to be thought of as an approximation of A . Specific choices of W 

e and A are considered later In section A. The second and final stage Of 
the approximation is to define 


'i 


B - W* C W 


Where C is an approximation of i + l A • Various cholc 


es are considered 


below: 


3®* Two-component Splittin p^ 

Let A be decomposed as follows: 


A » A, + A- 
~ '• 1-2 


Then a possible definition of c is 


S • Q ^ Aj)(I •»• c A^) • 1 + ' A + t.2 a^A^ - 1 -f » a + o(i^ 

Th. U.C t»tt th. „ture of ,ho approKlMtlon. CoBputot tonal 


- t 


ll 


ORIGINAL PACi« B 
OF POOR OOAtllV 

ulinplicity Is gained if Aj and Ag ate very eparse and more easily factor 
iTsed than A . 

For example, let 


A, » L (A) 

(3.15) 

A, - ii (A) 

■’2 H ' 

(3.16) 


Thus B has the following simple form 


B 


r<l + c L^(A)) (I + e Og(A))j; 


(3.17) 


As may be seen. B is already factored and the factors tequl.e no more storage 
than that for A . Only diagonal scaling, .inu rcrward reductlcns and back sub- 
atltutlons with sparse triangular arrays ere needed to solve equations .-ith B 
.. coefficient matrix. This eliminates the cost of facto.M^atlon end obviates 
the etorage penalties due to "flll-in”. Equation (3.17) represents a symme- 
trized Gauas-Seldel type approximate factorization. 


One-paas Multi -component Splitting 

Consider a multi-component sum decomposition of A ; 


n 

A = 2 A, 

1=1 

Let 


n 

S - n (I + c A,) 
i-l ' 

• (I ♦ e I )(l + •. A.) ... (1 + t A ) 

* I 4 e A + o(c^) 


(3.18) 


(3.19) 
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J***® • straightforward gtinarallzatlon of the two-component 

apllttlilg. 

3c . Two-paee Hultl-comoonent Splittim t 

This generalization of the preceding case has qualitative advantages under 
certain circuaatanees (Marchuk (Ml|). Let 


n (i+|a) n (i+^A.) 

1-1 2 ^ i-n ^ 


C — 


■ <J *1 4i>« ♦IV ... <t 


nv - 


X (I + 


T 2 ^11- 1 ^ ••• ^5 7 


C -r 


- I + c A + O(e^) 

If each Ax i* eyBaetrlc and positive semi-definite, then C 
and poaltiva-daflnite. 


( 3 . 20 ) 

is symmetric 


B«ent-bY-element (EBEl A pproximate Factorizations 

The EBB approximate factorization is simply a multi-component Splitting 

to which the components are the finite element arrays themselves. That la we 

aaauma 

net 

6 • 2 6 * ( 3 . 21 ) 

e«l 

^ Then C may be defined by 

•Ithar the ona-pasa or two-pass formulae, viz. 

t 


«el 

n (I + e A*) 

a-l 


C • 


( 3 . 22 ) 
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S * ® ^ 7 ?*) II <5'*’fA*) ("Marchuk EBE*') 


0 . 23 ) 


“SBftEiLi. W« «»l«h CO uaa the term element In the generic aense of a *'. 05 - 
dOMln ■odel". where an element could be an Individual finite element or 
• •ubaeseAbly of elemente. Thus we allow limited aaeembly. Various equivalent 
termlnologlee have been used to define this concept, such as "substructures" 
end "auperelements”. Subdomain finite element models inherit the symmetry and 
definiteness properties of the global array. Consequently, the remark made 
after (3.20) applies. 

asasitli Th. .rr.y. 1„ (3.22) and (3.23) nead to bo factorlted Into 

trlMguUr fo«. thl. can b. dona axactly ualng product daconpoaltlons or 
approxu^taly ualng aum dacoupoaltlons oa In .action 3a. agnation. (3.15)-(3.17) 

one-pass 

Corresponding to (3.22) we have 


or 



(product) 


(3.24) 


C 


“eft 

" Q + (1 + t 0 (A®)) 


(sum) (3.25) 


aass (3.24) la tdantlcal to (3.22) wharoa. (3.25) la an approxlaat Ion of 
(3.22). 


two-pass 


Corresponding to (3.23) we have 
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.Hi 5^P<I + f PIpC. + t A")t',(: * t f) 


“ ® f r>?. <I + ^ A'ju (I + £ 


•■n 


e£ 


•p ^ 2 ^ +f 


(product) 0.26) 


“ei 


n (I + 1 £ (^)) (I + ^ 0 (F)) 

•• 4 ^8 - ^ £ -^3 ^ 


^ ft / * N 


X n (I + f £ (A*)) (I + f 0 (A*)) 

^ 2--8- - 2-s^ 

••net 


(Hum) (3.27) 


Note (3.26) 1. Identical to (3.23) whereas (3.27) is an approximation of 
(3.23). 


Whether to use product or sum factorizations of the element arrays Is a 
«luestlon of efficiency. Belytschko and Liu [Bl] have proposed a fast exact 
Inversion procedure for 4-node heat conduction elements. For subassemblies, 
the approximate sum factorizations may have advantages. 

— demands are vastly less in the EBE case. Only one 

•lemenc at a time need be stored and processed. Whether or not it is desirable 
to save factorized element arrays depends upon the availability of high speed 
RAM, and the trade-off between CPU and disk I/O costs. 

. Re m ark 4. The ordering of the factors Influences how well C approximates 
I + C A . The global product decomposition, 


1 


1 + e A 


i;p<i ^ + c A) , 


(3.28) 




L5 
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Mgg«at« that It Bight be worthwhile to reorder the factors In 0.24)-(3.27) 
•uch Clot *U lowt ClUniular (actors pracodo diagonals ohlch In torn pracodo 
oppar erungour f«ttora. Thla rasolts In tha folloolng “raordarod" schaoaa. 


( “a» 

j:, ^ r> 


l-"a» 


ng.lt 

II D (I + . A*) 
c*=l ' 


("Crout EBE”) 



C 


“ei 

n (I c C (a“)) 

e-l ' '■* ' 


1 

II (1 + c 0„(^)) 

t*=U 


(".synun. CuusB-Suldul EBK") 


(3.2' 


(3.3( 


Note that In the case of symmetric a , symmetry is preserved by (3.29) and 

(3.30). Thus there seems little motivation for similarly reordering the two- 
peas versions. 

In cm, a... o( poaltlva p^U + p'., tha Croat factorisations c«, b. 

r«,rd.r«l In car*, of Chol.sk, (actors. Par axanpU. . variant of (J.M) t. 


c 


“et 

", Jp'j + 1' r> 

e«l 


J 

II A®) 

e«n_„ ‘ 


("Cholesky EBE") 


(3.31 


!!&ie (3.31) and (3.29) 


are not gnurully identical. 


I ■ ' 
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OF HOOK QUwLfiV 

>» •l«««nt« arc aegrfKauul into non-contlguou« nubgroupa then calcu- 

UCloaa are parallallzable. For cxampU', l>rick>llke domains can be decomposed 
Into eight non-contlguous element Krmipa (see Figure 1). Because the elements 
la each subgroup have no common dugrees-of- freedom, they can be processed In 
parallel. Tha eight groups, however, need to be processed sequentially. For 
analogous two-dimensional domains, four element groups need to be employed. 

~ computational experience that If A Is symmetric 

and positlva-deflnlte. then qualitatively faithful approximate factorisations, 
uhleh preserve these properties, perform much better than those that do not. 


Select ion of H , e and A . 

The following two definitions of W , e and A have been employed: 

a.) This choice is motivated by the derivation of the PR algorithm (see 

(h13]> 


Thus 


W » D (A) 
~ '•a - 


(4.1) 


A ■ W”** A W"^ 


(4.2) 


A - D (A) + A 

•V ^ O -«• 


(4.3) 


b.) In this case 


w • Dg(A) 


(4.4) 


A - -J- W‘**(A - D (A))W-'S 


(4.5) 
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which leads ce 


A » A 


(4.6) 


Thla procedure vas proposed in Wlngct Iwi ) . 


Reaark 1 . 

Itoettc.. of cho fora « - d,(a) + a woto Inttoiluced l„ |h 5 I. No»t- 0 .. 1 d 
ond Porlett [NlJ .MlytlcoU, Inveatlgatod the effectl.eoe.o of metticee of 
thU typo oo . «,d.l problem end coocloded that the optlmel ,.l,e of t 
- . Ihle limit 1, by the defloltlona ( 4 . 4 ) ,„d ( 4 . 5 ). 

Remark 2 . 

The Impllcit-eapUclt finite element concept [112, H3, H6-H9J haa a 
olmpl. «,d clean Implementation within EBE approximate factorlaatlona 
««.u that an «tpllclt element contrlbutea only Its diagonal mass matrix to 
the coefficient matrix A . Thus w . according to any one of the preceding 
-tflnltlona. totally accounts for the explicit element contributions and the 
corresponding A*'s are Identically aero. What this means la that explicit 
— ent. may be simply omitted fr„ the ferula for c . l„ nonlinear problems 
this open, the way to tlme-adaptl.e Implicit-explicit element partitions. 

10 calculating tha .l««„t contributions to the residual (l.e. "b”). a check 
cmn bs msds mhsthsr or not the critical tine step la exceeded for the element. 
H It la not aacaadad. a flag 1 . set to indicate that element contribution, 
to c may b. .imply ignor^l. The potential sayings In nonlinear transient 
-lyala procaduras Incorporating those Ideas 1 , clearly considerable. 


5. Sample groblenia 


Th. c«p.c^ „.„u. obtains. „„ „ VAX cornet., .^.1. 

<J2 bit. p« point void). Critical time atopa war. computed fro. 

^crlt ^^^man "bare la the maximum elument elvenvaluo. Unleea other- 

wlaa notml. bilinear quadrllat.raXa were employed with 2 » 2 Cuua. Integration. 


NASA Insulate d Structure Test Problem 

The problem deacrlptlon la Illustrated In Figure 2. A number of comparl- 
aon. of the varloua techniques proposed were made for this prohlem. In Table 
e aelectlon of X • A [Wl] la seen to converge faster than X • Dg(A) + A 
In addition, the steady-state residual potential energy, measured hy Lgj„(-'p ) 
attains a smaller value when X - A . This and other calculations have todlcaL 
that A - A is the superior choice. U Is used In the comparisons shown In 
Table 2. The first ohservatlon which may be mad. here la that the CC algorithm 
is more effective than PR with line searches. The use of .PCS updates would 
doubtleasly Improve upon the performances of PR. however, the Increased data 
pool required to store the .PCS vectors Is a significant disadvantage. Thus 
our current praference In symmetric positive-definite case. 1. the CG method. 

Th. EBE factorisations, ranked from best to worst, are: Croat. Cholesky. 

Marchuk, and symmstrlsed Gauss-Seldel . Nevertheless, it must he kept In mind 
that ovarall computational sfflclency may alter this ordering. Por example, 
slthough aymmetrlsed Gauss-Seldel was the slowest to converge. It does not re- 
quire el,m«,t factorisation, an advantage. A final point to observe is that 
cunv.rg«K. 1. typically slower during the larger time step sequences (l.e. 

.taps 21-50) than the smaller step sequences (l.e. steps 1-20). There appear 
to b. two reasons for this. Plrstly. for the larger steps th. solution closely 
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Approximates the steady state thuo 

«<ly state. Thus the taltt.1 residual is tairly suall. 

tasultliig in a Here stringent convergence criterion tA 

B uce criterion, (a more reasonable con- 
vergence criterion would no doubt result f . 

result In faster termination for the larger 

sesLd^T I" fact, even the •'non-converged" solutions pos- 

. .,uat. accuracy Tro. a yracticai standyoint.) Secondly, the condition- 

.l»«.t Sectors deteriorates Tor larger steys in that the elenent arr.ys 
become nearly singular. ^ 
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;K 

f* 


r’ 


likiA * • CMparl«( 

Algorteha: PR e L8 
ApproalMU fatturla 

m ul A . u (A) t A Mllli A . A , t 

, «' • * 4 f 

(no BFOS) r h 

tUon; Mari' liuk kUK 

a 

% 

l«8l0(- I',)*"* 

avo. It *11, por Htyp 1 

ttopa l»2C 

utepft 21-50<^) 

0,(A> ♦ A 

- 11. 0 

k.U 

10 

k 

- IS, 4 

5.01 

10 

Zfik&fi ^ • Cotptr iflci 
ftctorlti 

AltorltNii PR ^ IS 

>n of PR and CC ulggrltlat and vtrloui 
tlonte In otch caae A • A . 

(no BPGS) 

EBE tpproKlooco 

BBS ApproB. fact. 


«ve. it'e. par atep | 

acepa 1*20 

tctpt 2U50(^> 

tyan. Gtutt-Stidti 

- 13,4 

5.41 

10 

Ktrchuk 

- 15.4 

5.03 

10 

Cholttky 

- 15,8 

3.95 

10 

Crout 

- 15.1 

3.95 

10 

Algor it ta: CG 



UB approB. fact. 


tvt. lt*t. pur step 1 

ttept 1-20 

atepii 2I*S0(^) 

•y«i. CtuM*stidtl 

- r).j 

3.95 

9.0 

Harchuk 

- 2b, } 

3.50 

8.1 

Cholttky 

- 25.3 

3,45 

7.9 

Crout 

- 25. 1 

2.95 

8.0 


Notts ; 




I. I 

(♦) Pf, the final valuu ul (lotentlal enersv i« n>t«<.4. j u 

e««ct ateedy-atate mluilon. ConMquSJilv ^w*^^’** 

‘oill0(- Hf). .ha better the 

2ll“"l‘!“orrKc. fur’stej^a!,);'!'* 






21 


Parallel/ Sequent lal Teat Prol.l«m 

Th. probla. d».ctlptl«. la given In Plgurn 3. The purpnnn of thin ptob- 
1- U to eo^ntn convotgnnco chntoc.orlntUa for “natnrol" oXomont ordotln,.. 
•hlch ..c...lt.t. .«,uo„tl.l procoaalng. with ordorlnga that lend thennolvo. ’ 
to p.r.U.1 conpuftlonn. The comparUona warn all partornad with th. CO 
nlgotlth.. A - A and tha Cholaak, E«E approKlmata faatorlaatlon. 

Over tha thirty tlaa .taps the aa,uantml ordering avaragad 2.53 Iteration, 
par .tap ta attain canw.rg«.ce, wharaaa the parallel ordering averaged 3.47 
Itaratiana. Deaplta the fact that the oarallel ordering Is slower, which might 
b. ^.tlclpatad, tha fact that It 1. reasonably fast 1. extremely encoaraglng. 
Par the 256 al.m«.t ...h shown a 64-prooeasar comparer could attain spe«i. 64 
ttmaa faster than a single processor. This more than comp.ns.tes tor th. some- 

nbat slawsr canv.rg.nc. of th. parallel ordering. The gains In larger prablmss 
«r« potantUUy even note spectacular. 
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6. ConeXualona 

The numerical comparlaona between the PR and CG algorithms Indicate that 
the CG algorithm la auperlor to PR. It la likely that BFGS updates would have 
improved the performance of PR, however, the need to store the BFCS vectors la 
considered a aerlous drawback when compared with the small, fixed-storage re- 
quirements of the CG algorithm. 

Among the EBE approximate factorizations, the Crout variant seemed best, 
however, the Cholesky, Marchuk, and symmetrized Gauas-Seldel versions were also 
effective and thus a preference for one over another may need to be baaed on 
other computational considerations. 

The calculations comparing parallel and sequential orderings are very 
exciting. The preliminary Indications are that parallel processing with EBE 
factorizations may be a very efficient computational strategy. 

Although a number of possible Improvements may still be envisioned, we 
believe that the methodology developed is at a stage where it may be Incorpo- 
rated as an option in production heat conduction codes. Future hardware devel- 
opments, such as parallel multi-processor computers, promise to further enhance 
the performance of EBE techniques on large problems. 
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Appendix 1 terlvatlon of Liner Al.pbr.l„ In thg Plnl„. 

Analyalfl of Hoat Conduction P rnhli.n.x. 


Gonglder the following aeml-diacrcta system: 

Ma+Cv"F tr 

where M Is the cepaclty wtrlx, c In the conductivity xetrlx, P 1. the 
heot-eupply vector, v Is the teeperuture vector end a • J is the tlme- 
r.te-of-tenper.ture vector. We sssune that M end C 'sre'eye^trlc and 
positive definite, end they may depend upon v snd t (tl«). We employ a 
predlctor-multlcorrector method to tlme-dlscretlae the system. To this end 
we define the temperature "predictor" by 

Xn+1 ■ !„ + - Y)5„ (,. 

where subscripts refer to the step number; At Is the time step, v snd a 
ere the approximations to ,(t„) .„d a(t„) , respectively, and y 1. a 

persmeter governing etablllty snd accuracy characteristics of the algorithm 

1H3J. Calculations begin with the Initial data v^ and a„ , a^ may be 
calculated from 


% 2o “ ?o - h ^0 

In each time step a nonlinear algebraic problem arises which may be 
solved by Newton-type iterative procedures; 


(1.3) 


1-0 


(1 is the Iteration counter) (1.4) 


OKitaiiMML I'Aisii Vm 

OF POOR QUALITY 


26 


OR(G(NAL PAGE tG' 
OF POOR QUALITY 


~n+l 

" Vl 



^(1) 

-n+1 

- 0 

* 

> (predictor phase) 

R - 

P . u<^> 

-•n+1 -'H+l ^n+1 

“n+1 ?n+l 

(residual) 

* 

M - 

?wi + VA. C«> 


(effective capacity) 


M* Aa ■ R 

^ <lw 


(1.5) 

( 1 . 6 ) 

(1.7) 

( 1 . 8 ) 

(1.9) 


a(i+l) 

2n+l 


-n+1 



? (corrector phase) 


(1. 10) 

(1. 11) 


If additional Iterations are to be performed, 1 la replaced by 1+1 , 

and calculations resume with (1.7). Either a fixed number of Itaratlons may 

be performed, or Iterating may be terminated when Aa and/or R satisfy 

preassigned convergence conditions. When the Iterative phase Is completed, the 

solution at step n+1 Is defined by the last Iterates (viz. v , - and 

( 1 + 1 ) 

2n+l “ 2n+l point, n Is replaced by n+1 , and calculations for 

the next time step may begin. 

So-called Imp lie It -exp lie It element partitions [H2, H3, H6-H9] may be 
encompassed by the above formulation simply be excluding explicit element contri- 
butions from C . A totally explicit formulation Is attained by Ignoring C . 

In these cases It Is necessary to employ a diagonal capacity matrix In explicit 
regions to attain full computational efficiency. 

To simplify the writing In the body of this paper we adopt the following 
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notations in place of (1.9); 


A X ■ b 


Thu. during u.ch .tup, at each Itaratlon. wa wish to aolva (1.12) la which A 
Is assembled from element arrays, that is 




OWdUAl. PA'isiG V'j 
OF POOR QUALITY 
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Figure 1. DccomfxMition of llirtMydimcnsional <lomaio into eight 
gfoups of brick elements for pnrnllcl processing. 
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Figure 3. Problem description for pnrallel/seqaential comparison 




